Speech translation (ST) is the task of directly translating acoustic speech signals in a source language into text in a foreign language. ST task has been addressed, for a long time, using a pipeline approach with two modules : first an Automatic Speech Recognition (ASR) in the source language followed by a text-to-text Machine translation (MT). In the past few years, we have seen a paradigm shift towards the end-to-end approaches using sequence-to-sequence deep neural network models. This paper presents our efforts towards the development of the first Broadcast News end-to-end Arabic to English speech translation system. Starting from independent ASR and MT LDC releases, we were able to identify about 92 hours of Arabic audio recordings for which the manual transcription was also translated into English at the segment level. These data was used to train and compare pipeline and end-to-end speech translation systems under multiple scenarios including transfer learning and data augmentation techniques.
translated by 谷歌翻译
Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitive to noise points. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change points in data streams with the tolerance of noise points. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
translated by 谷歌翻译
Training of a Machine Learning model requires sufficient data. The sufficiency of the data is not always about the quantity, but about the relevancy and reduced redundancy. Data-generating processes create massive amounts of data. When used raw, such big data is causing much computational resource utilization. Instead of using the raw data, a proper Condensed Representation can be used instead. Combining K-means, a well-known clustering method, with some correction and refinement facilities a novel Condensed Representation method for Machine Learning applications is introduced. To present the novel method meaningfully and visually, synthetically generated data is employed. It has been shown that by using the condensed representation, instead of the raw data, acceptably accurate model training is possible.
translated by 谷歌翻译
Ordinary Differential Equations (ODE)-based models have become popular foundation models to solve many time-series problems. Combining neural ODEs with traditional RNN models has provided the best representation for irregular time series. However, ODE-based models require the trajectory of hidden states to be defined based on the initial observed value or the last available observation. This fact raises questions about how long the generated hidden state is sufficient and whether it is effective when long sequences are used instead of the typically used shorter sequences. In this article, we introduce CrossPyramid, a novel ODE-based model that aims to enhance the generalizability of sequences representation. CrossPyramid does not rely only on the hidden state from the last observed value; it also considers ODE latent representations learned from other samples. The main idea of our proposed model is to define the hidden state for the unobserved values based on the non-linear correlation between samples. Accordingly, CrossPyramid is built with three distinctive parts: (1) ODE Auto-Encoder to learn the best data representation. (2) Pyramidal attention method to categorize the learned representations (hidden state) based on the relationship characteristics between samples. (3) Cross-level ODE-RNN to integrate the previously learned information and provide the final latent state for each sample. Through extensive experiments on partially-observed synthetic and real-world datasets, we show that the proposed architecture can effectively model the long gaps in intermittent series and outperforms state-of-the-art approaches. The results show an average improvement of 10\% on univariate and multivariate datasets for both forecasting and classification tasks.
translated by 谷歌翻译
Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.
translated by 谷歌翻译
Predicting the health risks of patients using Electronic Health Records (EHR) has attracted considerable attention in recent years, especially with the development of deep learning techniques. Health risk refers to the probability of the occurrence of a specific health outcome for a specific patient. The predicted risks can be used to support decision-making by healthcare professionals. EHRs are structured patient journey data. Each patient journey contains a chronological set of clinical events, and within each clinical event, there is a set of clinical/medical activities. Due to variations of patient conditions and treatment needs, EHR patient journey data has an inherently high degree of missingness that contains important information affecting relationships among variables, including time. Existing deep learning-based models generate imputed values for missing values when learning the relationships. However, imputed data in EHR patient journey data may distort the clinical meaning of the original EHR patient journey data, resulting in classification bias. This paper proposes a novel end-to-end approach to modeling EHR patient journey data with Integrated Convolutional and Recurrent Neural Networks. Our model can capture both long- and short-term temporal patterns within each patient journey and effectively handle the high degree of missingness in EHR data without any imputation data generation. Extensive experimental results using the proposed model on two real-world datasets demonstrate robust performance as well as superior prediction accuracy compared to existing state-of-the-art imputation-based prediction methods.
translated by 谷歌翻译
在尚未解决反事实解释的挑战中(CE),存在稳定性,各种CE的综合以及缺乏合理性/稀疏性保证。从更实用的角度来看,最近的研究表明,规定的反事实回复通常并非完全由个人实现,并证明大多数最先进的CE算法在这种嘈杂的环境中很可能会失败。为了解决这些问题,我们提出了一个概率框架,为每个观察结果提供了稀疏的本地反事实规则:我们提供的规则可以提供一系列可以用给定的高概率改变决策的价值观,而不是给出不同的CE。此外,通过构造从这些规则中得出的回报是可靠的。这些本地规则被汇总为区域反事实规则,以确保跨观察结果的反事实解释的稳定性。我们的本地和区域规则保证了recourse忠实于数据分布,因为我们的规则使用一致的估计器对基于随机森林的决定的概率进行了始终如一的估计。此外,当我们选择具有更改决策概率的最小变量时,这些概率给出了可解释和稀疏的规则。可以使用计算反事实规则的代码,我们将其相关性与标准CE和最近的类似尝试进行比较。
translated by 谷歌翻译
我们介绍并讨论了一个运行时体系结构,该架构将感官数据和分类器与基于逻辑的决策系统集成在一起,并在电子健康系统的背景下,用于康复神经运动障碍儿童。在此应用程序中,儿童以游戏的形式执行康复任务。该系统的主要目的是从可用的传感器和分类器(例如,眼镜跟踪器,运动传感器,情感识别技术)中得出一组儿童当前的认知和行为表现(例如参与,注意力,任务准确性)的参数。 )并做出相应的决定。这些决策通常旨在通过在注意力较低时触发适当的重新参与刺激,改变游戏或使孩子对任务失去兴趣时的困难来改善孩子的表现,因为它太容易了。除了对情绪识别和头部姿势估计的最新技术外,我们还使用了事件计算的概率和认知逻辑编程方言的运行时变体,称为认识论概率概率事件。特别是,该符号框架的概率组成部分允许与机器学习技术的自然接口。我们概述了体系结构及其组件,并通过讨论运行的示例和实验来展示其一些特征。正在考虑逻辑编程理论和实践(TPLP)的出版物。
translated by 谷歌翻译
我们为基于分数的生成模型(SGM)(例如Denoising扩散概率模型(DDPM))提供理论收敛保证,该模型构成了大型现实世界中生成模型的骨干,例如DALL $ \ cdot $ E2。我们的主要结果是,假设有准确的分数估计值,此类SGM可以从本质上有效地从任何现实的数据分布中进行采样。与先前的作品相反,我们的结果(1)以$ l^2 $准确的分数估算(而不是$ l^\ infty $ -CACCRATE)保持; (2)不需要限制性的功能不平等条件,而这些条件排除了实质性的非con虫; (3)在所有相关问题参数中刻度缩放; (4)匹配兰格文扩散离散的最新复杂性保证,前提是得分误差足够小。我们认为这是SGM的经验成功的强有力理论理由。我们还基于严重阻尼的Langevin扩散(CLD)检查SGM。与传统的观点相反,我们提供了证据,表明CLD的使用不会降低SGM的复杂性。
translated by 谷歌翻译
在本文中,我们提出了一条新型的管道,该管道利用语言基础模型进行时间顺序模式挖掘,例如人类的移动性预测任务。例如,在预测利益(POI)客户流量的任务中,通常从历史日志中提取访问次数,并且仅使用数值数据来预测访客流。在这项研究中,我们直接对包含各种信息的自然语言输入执行预测任务,例如数值和上下文的语义信息。引入特定的提示以将数值时间序列转换为句子,以便可以直接应用现有的语言模型。我们设计了一个Auxmoblcast管道,用于预测每个POI中的访问者数量,将辅助POI类别分类任务与编码器架构结构集成在一起。这项研究提供了所提出的Auxmoblcast管道有效性以发现移动性预测任务中的顺序模式的经验证据。在三个现实世界数据集上评估的结果表明,预训练的语言基础模型在预测时间序列中也具有良好的性能。这项研究可以提供有远见的见解,并为预测人类流动性提供新的研究方向。
translated by 谷歌翻译